The group lasso for logistic regression
نویسندگان
چکیده
The group lasso is an extension of the lasso to do variable selection on (predefined) groups of variables in linear regression models. The estimates have the attractive property of being invariant under groupwise orthogonal reparameterizations. We extend the group lasso to logistic regression models and present an efficient algorithm, that is especially suitable for high dimensional problems, which can also be applied to generalized linear models to solve the corresponding convex optimization problem. The group lasso estimator for logistic regression is shown to be statistically consistent even if the number of predictors is much larger than sample size but with sparse true underlying structure. We further use a two-stage procedure which aims for sparser models than the group lasso, leading to improved prediction performance for some cases. Moreover, owing to the two-stage nature, the estimates can be constructed to be hierarchical. The methods are used on simulated and real data sets about splice site detection in DNA sequences.
منابع مشابه
Penalized Lasso Methods in Health Data: application to trauma and influenza data of Kerman
Background: Two main issues that challenge model building are number of Events Per Variable and multicollinearity among exploratory variables. Our aim is to review statistical methods that tackle these issues with emphasize on penalized Lasso regression model. The present study aimed to explain problems of traditional regressions due to small sample size and m...
متن کاملNon-asymptotic Oracle Inequalities for the Lasso and Group Lasso in high dimensional logistic model
We consider the problem of estimating a function f0 in logistic regression model. We propose to estimate this function f0 by a sparse approximation build as a linear combination of elements of a given dictionary of p functions. This sparse approximation is selected by the Lasso or Group Lasso procedure. In this context, we state non asymptotic oracle inequalities for Lasso and Group Lasso under...
متن کاملLogistic Regression with Structured Sparsity
Binary logistic regression with a sparsity constraint on the solution plays a vital role in many high dimensional machine learning applications. In some cases, the features can be grouped together, so that entire subsets of features can be selected or zeroed out. In many applications, however, this can be very restrictive. In this paper, we are interested in a less restrictive form of structure...
متن کاملSparse multi-class prediction based on the Group Lasso in multinomial logistic regression
2 3 Preface Many classification procedures are based on variable selection methodologies. This master thesis concentrates on continuous variable selection procedures based on the shrinkage principle. Generally, we would like to find sparse prediction rules for multi-class classification problems such that in increases the prediction accuracy but also the interpretability of the obtained predict...
متن کاملClassification with Sparse Overlapping Groups
Binary logistic regression with a sparsity constraint on the solution plays a vital role in many high dimensional machine learning applications. In some cases, the features can be grouped together, so that entire subsets of features can be selected or zeroed out. In many applications, however, this can be very restrictive. In this paper, we are interested in a less restrictive form of structure...
متن کامل